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_ ' Abstract 

! We study the power of local information algorithms for optimization problems on social and 

technological networks. We focus on sequential algorithms for which the network topology is 
C , initially unknown and is revealed only within a local neighborhood of vertices that have been 

irrevocably added to the output set. The distinguishing feature of this setting is that locality 
is necessitated by constraints on the network information visible to the algorithm, rather than 
being desirable for reasons of efficiency or parallelizability. In this sense, changes to the level of 
network visibility can have a significant impact on algorithm design. 
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This framework captures situations in which the optimizer is an external agent that docs 
not have direct access to the network data, but rather learns about the graph structure only via 
(costly) queries. For instance, a user may wish to strategically find, and form connections to, 
high-degree nodes in an online social network. An appropriate algorithm for this search problem 
must take into account the fact that the structure of the graph is not known in advance, and 
\ is only revealed locally as nodes are added to the user's set of connections. Given this limited 

network visibility, how should the user choose which connections to form? This question is 
^sO , relevant not only to the optimizer, but also to the designer of the social network platform who 

■ must decide how much network topology is revealed to individual users. 

We study a range of problems under this model of algorithms with local information. We 
first consider the case in which the underlying graph is a preferential attachment network, 
where we show that local information algorithms can perform surprisingly well at various tasks. 
We show that one can find the node of maximum degree in the network in a polylogarithmic 
number of steps, using an opportunistic algorithm that repeatedly queries the visible node of 
maximum degree. This addresses an open question of Bollobas and Riordan on the power 
of local algorithms on preferential attachment graphs. We also show that a polylogarithmic 
number of queries suffices to find a path between any two nodes in the network. In contrast, 
local information algorithms require a linear number of queries to solve these problems on 
arbitrary networks. 

Motivated by problems faced by recruiters in online networks, we also consider network 
coverage problems such as finding a minimum dominating set. For this optimization problem 
we show that, if each node added to the output set reveals sufficient information about the 
set's neighborhood, then it is possible to design randomized algorithms for general networks 
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that nearly match the best approximations possible even with full access to the graph structure. 
We also show that this level of visibility is necessary: it is impossible to achieve a sublinear 
approximation for general graphs if less information is revealed on each query. We conclude 
that a network provider's decision of how much structure to make visible to its users can have 
a significant effect on a user's ability to interact strategically with the network. 



1 Introduction 



In the past decade there has been a s urge of interest in the nature of complex networks that arise in 
social and technological contexts; see Easley and Kleinberg 2010| ] for a recent survey of the topic. In 
the computer science community, this attention has been directed largely towards algorithmic issues, 
such as the extent to which network structure can be leveraged into efficient methods for solving 
complex tasks. Common problems include finding influential individuals, detecting communities, 
constructing subgraphs with desirable connectivity properties, etc. 

The standard paradigm in this line of work is that an algorithm has full access to the net- 
work graph structure. Recently there has been significant interest in local algorithms, which are 
roughly characterized by vertex-specific decisions that are based upon local rather than global net- 
work struc ture. This locality of computatio n has been motivated by app lications to distributed 
algorithms Giakkoupis and Sauerwald 2012 1. Naor and Stockmeyer 19931 ] . impro ved runtime ef- 



ficien c y iFaloutsos et ahl 1 200 
20091 ] . iRubinfeld and Shapir 



Spielman and Tend [2008f] , and property testing lHassidim et al 



20111 ]. In this work we consider a different motivation for these 



local methods: in some circumstances, an optimization is being performed by an external user who 
has inherently restricted visibility of the network topology. For such a user, the graph structure is 
revealed only incrementally within a local neighborhood of those nodes for which a connection cost 
has been paid. The use of local algorithms in this setting is therefore necessitated by constraints on 
network visibility, rather than being a means toward an end goal of efficiency or parallelizability. 

As a motivating example, consider an agent in a social network who wishes to find (and link to) 
a highly connected individual. For example, this agent may be a newcomer to a community wanting 
to interact with influential agents, or a recruiter attempting to form strategic connections in a social 
network application. Finding a high-degree node is a straightforward algorithmic problem without 
information constraints, but many online and real-world social networks do not provide enough 
information to permit targeting a specific high-degree vertex. For instance, most online social 
networks display graph structure only within one or two hops from a user's existing connections. 

Is it possible for an agent to solve such a problem using only the local information available 
on an online networking site? This question is relevant not only for individual users, but also to 
the designer of a social networking service who must decide how much information to reveal. For 
example, at the time at which this paper was written, Linkedln allows each user to see the degree of 
nodes two hops away in the network, whereas Facebook does not reveal this information by default. 
We ask: what impact does this design decision have on an individual's ability to interact with the 
network? 

More generally, we consider graph algorithms in a setting of restricted network visibility. We 
focus on optimization problems for which the goal is to return a subset of the nodes in the network; 
this includes coverage, connectivity, and search problems. An algorithm in our framework proceeds 
by incrementally and adaptively building an output set of nodes, corresponding to those vertices 
of the graph that have been queried (or connected to) so far. When the algorithm has queried a 
set S of nodes, the structure of the graph within a small radius of S is revealed, which can guide 
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the choice of which nodes to query next. The principle challenge in designing such an algorithm 
is that decisions must be based solely on local information, whereas the problem to be solved may 
depend crucially on the global structure of the graph. 

These information restrictions can severely limit the power of graph algorithms. For many 
problems we derive strong lower bounds on the performance of such algorithms in general networks. 
However, our interest is directed mainly at the ability for agents to solve problems locally in natural 
social networks. We therefore turn to the class of preferential attachment graphs, which model the 
structure of many real- world networks such as friendship graphs and World Wide Web connectivity. 
For graphs generated via preferential attachment, we demonstrate that local information algorithms 
can do surprisingly well at solving many search and connectivity problems, including shortest path 
routing and finding the k vertices of highest degree (up to a small poly logarithmic factor), for any 
fixed k. 

We also consider node coverage problems on general graphs, where the goal is to find a small 
set of nodes whose neighborhood covers all (or much) of the network. Such coverage problems 
are especially motivated in our context by applications to employment-focused social networking 
platforms such as Linkedln, where there is benefit in having as many nodes as possible within a few 
hops of one's direct connection^]. For certain problems of this form, we design local information 
algorithms whose performances approximately match the best possible even when information 
about network structure is unrestricted. Additionally, we demonstrate that the amount of local 
information available is critical to the success of such algorithms: strong positive results are possible 
at a certain range of visibility (made explicit below), but non-trivial algorithms become impossible 
when less information is made available. This observation has implications for the design of online 
networks, such as the amount of information to provide a user about the local topology: seemingly 
arbitrary design decisions may have a significant impact on a user's ability to interact with the 
network. 



Results and Techniques Our first set of results concerns local information algorithms for pref- 
erential attachment networks. Such networks are defined by a random process by which nodes 
are added sequentially and form random connections to existing nodes, where the probability of 
connecting to a node is proportional to its degree. 

We first consider the problem of finding the root (i.e. first) node in a preferential attachment 
network. A random walk in a preferential attachment network would encounter the root node 
in 0(y/n) steps (where n is the number of nodes in the network). The question of whether a 
better local information algorithm exists for this problem was posed by Bollobas and Riordan, 
who state that "an interesting question is whether a short path between two giv en vertices can be 



constructed quickly using only 'local' information" iBollobas and Riordanl 20041 ] . They conjecture 
that such short paths can be found in O(logn) steps. We make the first progress towards this 
conjecture by showing that poly logarithmic time is sufficient: there is an algorithm that finds the 
root of a preferential attachment network in 0(log 4 (n)) time, with high probability. This then 
implies the existence of polylogarithmic approximation algorithms for shortest path connectivity, 
finding the highest-degree node in the network, and other problems. 

The local information algorithm we propose uses a natural greedy approach: at each step, the 
algorithm queries the visible node with highest degree. Demonstrating that such an algorithm 



For example, Linkedln allows recruiters to execute searches for potential job candidates among all nodes within 
distance 3 from the recruiter, additionally displaying resume information for those within distance 2. 
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reaches the root within polylogarithmically many steps requires a probabilistic analysis of the 
preferential attachment process. A natural intuition is that the greedy algorithm will find nodes 
of higher degrees over time, making steady progress toward the root node. However, such progress 
is impeded by the presence of high-degree nodes that have only low-degree neighbors. What we 
must show is that these potential bottlenecks are infrequent enough that they do not significantly 
hamper the progress of the algorithm. To this end, we derive a connection between node degree 
correlations and supercritical branching processes to prove that a path of relatively high-degree 
vertices leading to the root is always available to the algorithm. 

We then consider general graphs, where we explore local information algorithms for dominating 
set and coverage problems. A dominating set is a set S such that each node in the network is either in 
S or the neighborhood of S. We design a randomized local information algorithm for the minimum 
dominating set problem that achieves an approximation ratio that nearly match es the lower bound 



on po ly time algorithms with no information restriction. As has been noted in iGuha and Khuller 



19981 ] . the greedy algorithm that repeatedly selects the visible node that maximizes the size of 



the dominated set can achieve a very bad approximation factor. We consider a modification of 
the greedy algorithm: after each greedy addition of a new node v, the algorithm will also add a 
random neighbor of v. We show that this randomized algorithm obtains an approximation factor 
that matches the known lower bound of f2(log A) (where A is the maximum degree in the network) 
up to a constant factor. We also show that having enough local information to choose the node 
that maximizes the incremental benefit to the dominating set size is crucial: no algorithm that can 
see only the degrees of the neighbors of S can achieve a sublinear approximation factor. 

Finally, we extend these results to related coverage problems. For the partial dominating 
set problem (where the goal is to cover a given constant fraction of the network with as few 
nodes as possible) we give an impossibility result: no local information algorithm can obtain an 
approximation better than 0(y/n) on networks with n nodes. However, a slight modification to 
the local information algorithm for minimum dominating set yields a bicriteria result (in which we 
compare performance against an adversary who must cover an additional e fraction of the network) . 
We also consider the "neighbor-collecting" problem, in which the goal is to minimize c\S\ plus the 
number of nodes left undominated by S, for a given parameter c. For this problem we show that the 
minimum dominating set algorithm yields an 0(c log A) approximation (where A is the maximum 
degree in the network), and we show that the dependence on c is unavoidable for local information 
algorithms. 

Related Work Over the last decade there has been a substantial body of work on understanding 
the ability to approximate solutions to problems in sublinear time. In the context of graphs, the goal 
is to understand how well o ne can approximate a propert y of the graph using a sublinear number 



of certain graph queries. See iRubinfeld and Shapiral [201 11 ] and iGoldreichl [20101 ] for recent surveys. 



In the context of social networks a recent work has suggested the Jump and Crawl model, where 
algorithms have no direct access to the network but can eith er sample a node uniformly (Jump) 
or access a neighbor of a previously discovered node (a Crawl) Brautbar and Kearnsl 2O10f ]. Local 



information algorithms can be thought of as generalizing the Jump and Crawl query framework 
to include an informational dimension. A Crawl query will now return any node in the local 
neighborhood of nodes seen so far while Jump queries would allow access to unexplored regions of 
the network. 



A notion of local computation algorithms was recently formalized by IRubinfeld et al.1 2011 ] 
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and was further developed in lAlon et al.l 20121 ] . A local computation algorithm must compute 
only certain specified bits of a global problem solution. In such algorithms an individual piece of 
the output is most often determined from its local neighborhood. While the main motivation of 
local computation algorithms is to distribute computation in order to compute the output value 
on distant locations, the motivation in our work is to capture the way a sequential algorithm, 
constrained to using only local information, can solve a problem. As a result, the informational 
dimension of visibility tends not to play a role in the analysis of local computation algorithms. 

Local algorit hms motivated by efficient computation, rather than informational constraints, 
were explored by Andersen et al. 20061 ] . Spielman and Teng 2008 ]. These works explore the ability 
to approximate a graph partition locally in order to efficiently find a global solution. In particular, 
they explore the ability to find a cluster containing a gi yen vertex by querying only close-by nodes. 

Preferential attachment networks were suggested by Barabasi and Albert 19991 ] as a model for 
large social networks. There has been much work on un derstanding t he properties of such ne t works , 
suc h as their degr ee distribution IBollobas et al.l 20011 ] and diameter IBollobas and Riordanl 2004 ]; 



sec 



Bollobasl [20031 ] for a short survey. The problem of finding nodes w ith competitively high degre e 



in graphs, using only Jump and Crawl network queries, is explored in Brautbar and Kearnsl 2010l ]. 
The question of whether a polylogarithmic time Jump and Crawl algorithm exists for finding a 
high degree node in preferential attachment graphs was left open therein. 

The low diameter of preferential attachment graphs has brought a surge of interest in distributed 
algorithms that solve problems in a small number of rounds. Each round, every node exchanges 
infor mation only with it s neig hbors bakWis and R^aldl M ^eta,!.! & A recent 



work iDoerr et al.l 20111 ] showed that such algorithms can be used for fast rumor spreading. Our 



results on the ability to find short paths in such graphs is different in that our algorithms proceed 
in a sequential way with a small total number of queries, making broadcast-style results insufficient. 

The ability to quickly find short paths in social networks has been the focus of much study , 
especially in the context of small- world graphs Giakkoupis and Schabanel 2011 ] , Kleinberg 2000l ] . 
Inspired by Milgram's experiment on local routing, Kleinberg models a small world as a grid, where 
only sparse number of extra contacts exist between far located individuals. He then shows that 
local routing using short paths is possible in certain cases. However, such a result along its follow- 
up work require some awareness of global network structure as the global grid address of each of 
one's acquaintance are known to him. Importantly, our algorithm that find short paths between 
individuals in preferential attachment graphs needs no global information at all but only that the 
degrees of individual's neighbors are known to him. We note that our result requires that routing 
can be done from both endpoints: i.e., the nodes are trying to find each other, rather than simply 
one trying to find the other. 

For the minimum dominating set problem, Guha and Khuller Guha and Khuller 19981 ] designed 
a local O(logA) approximation algorithm (where A is the maximum degree in the network). As 
a local information algorithm, their method requires that the network structure is revealed up to 
distance two from the current dominating set. By contrast, the local information algorithm that 
we design for the dominating set problem requires that less information be revealed on each step. 



Organization In section [2] we formally define the notion of local information algorithms and 
discuss our algorithmic framework. In section [3] we present an algorithm to find the root of a 
preferential attachment network in 0(log 4 (n)) steps, then in Section [5] we discuss applications to 
other algorithmic problems. In section [5] we discuss local information algorithms for the minimum 
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dominating set on arbitrary graphs, and in section [6] we extend these results to other graph coverage 
problems. 

2 Model and Preliminaries 

2.1 Notation 

We will denote an undirected graph by G = (V,E), where V and E are the node and edge sets 
respectively. We denote the number of nodes in a graph by n{G). 

For a vertex v S V, we will write dc{v) for the degree of v in G, and Ng(v) for the set of 
neighbors of v. Given a subset of vertices S C V, we write Nq(S) for the set of nodes that are 
adjacent to at least one node in S. We also write Dq(S) for the set of nodes dominated by S; that 
is, Dq(S) = Nq(S) U S. We say S is a dominating set if Dg{S) = V. Finally, we write for the 
maximum degree in graph G. In all of the above notation we will often suppress the dependency 
on G when clear from context. 

2.2 Algorithmic Framework 

We consider graph optimization problems in which the goal is to return a minimal-coslH set of 
vertices S satisfying a feasibility constraint. This class includes many natural coverage, search, and 
connectivity problems on graphs. The definition captures our notion of an algorithm that proceeds 
under local information constraints. 

We begin with a definition of local neighborhoods. 

Definition 1. The distance of v from set S of nodes in a graph G is the minimum, over all nodes 
u € S, of the shortest path length from v to u in G. 

Definition 2 (Local Neighborhood). Given a set of nodes S in the graph G, the r-open neighborhood 
around S is the induced subgraph of G containing all nodes up to distance r from S, where edges 
between nodes at distance exactly r from S are removed. In addition, the degree of each node at 
distance r from S is given. The r -closed neighborhood around S is the r-open neighborhood around 
S that includes all edges between nodes at distance exactly r from S. 

Definition 3 (Local Information Algorithm). Let G be an undirected graph unknown to the 
algorithm where each vertex is assigned a unique identifier. 

For integer r > 1, we say a (possibly randomized) algorithm, is a r-local algorithm for an 
optimization problem P if: 

• The algorithm proceeds sequentially, growing step-by-step a set S of nodes, where S is initial- 
ized to some seed node. 

• Given that the algorithm has queried a set S of nodes so far, it can only observe the r-open 
neighborhood around S. 

• The algorithm, guided by its local information, can add nodes to S via Jump and Crawl 
queries. A Jump returns a vertex chosen uniformly at random from all graph nodes. A Crawl 
returns a specified neighbor from the r-open neighborhood around S. 

2 In most of the problems we consider, the cost of set S will simply be 
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• At the end of its execution the algorithm outputs the set S as its solution to P. 

Similarly, for integral r > 1, we call an algorithm a r + -local algorithm for a problem P if its 
local information is made from the r-closed neighborhood around S. 

Our framework applies most naturally to coverage, search, and connectivity problems over the 
network vertices, where the family of valid solutions is upward-closed. 

More generally, it is suitable for measuring the complexity, using only local information, for 
finding a subset of nodes having a desirable property. In this case the size of S measures the 
number of queries made by the algorithm; we think of the graph structure revealed to the algorithm 
as having been paid for by the cost of S. 

For our lower bound results, we will sometimes compare the performance of an r-local algorithm 
with that of a (possibly randomized) algorithm that is also limited to using Jump and Crawl 
queries, but may use its full knowledge of the network topology to guide its query decisions. Such 
an algorithm would need to construct an optimal solution to the problem at hand by using the 
minimum expected number of Jump and Crawl queries. The purpose of such comparisons is to 
emphasize instances where it is the lack of information about the network structure, rather than the 
necessity of building the output in a local manner, that impedes an algorithm's ability to perform 
an optimization task. 



3 Preferential Attachment Graphs 

We now focus our attention on algorithms for graphs generated by the preferen tial attachment pro 



cess. The preferential attachment process was conceived by Barabasi and Albert iBarabasi and Albert 



1999]. Informally, the process is defined sequentially with nodes being added one after the other 



When a node is added it sends m links backward to previously created nodes, where the probability 
of connecting to a node is proportional to its current degree. 



We will use the following formal definition of the process, due to lBollobas and Riordanl [2004]. 
We begin with the case m = 1. Consider a fixed sequence of nodes 1,2, ... ,n. We shall inductively 
define a random graph process G\, 1 < t < n as follows. Start with G\ as the graph with one node 
with a self-loop. Given Gf , form G\ by adding the node t together with a single edge from t to 
s, where s is chosen randomly with 

P(s is chosen) = { 2t-i a 1 < s < t 1 
I 2*=I [fs = t 

where deg(s)(G^* ^) denotes the degree of node s in 

For m > 1 the process G m , 1 < t < n, is defined similarly, with the change that when node t is 
added we create m edges instead of one from t, one at a time, each time counting the previously 
added edges (including self-loops) as already contributing to the current degree of of nodes. 

We will present a 1-local approximation algorithm for the following simple problem on pref- 
erential attachment graphs on n nodes: given an arbitrary node u, return a minimal connected 
subgraph containing nodes u and 1 (i.e. the root of graph G^J. 

Our algorithm, which we call TraverseToTheRoot, is listed as Algorithm [TJ Roughly speaking, 
the algorithm grows a set S of nodes by starting with S = {u} and then repeatedly adding the 
node in N(S)\S with highest degree. In other words, the algorithm greedily adds the highest-degree 
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Algorithm 1 TraverseToTheRoot 
1: Initialize a list L to contain an arbitrary node {u} in the graph. 
2: while L does not contain node 1 do 
3: Add a node of maximum degree in N(L)\L to L. 
4: end while 
5: return L. 



node in the local neighbourhood of its current output set. What we will show is that, with high 
probability, this algorithm will traverse the root node within 0(log (n)) steps. 

For convenience, we have defined TraverseToTheRoot assuming that the algorithm can deter- 
mine when it has successfully traversed the root. This is not necessary in general; our algorithm 
will instead have the guarantee that, after 0(log 4 (ra)) steps, it has traversed node 1 with high 
probability. 

The remainder of this section will be dedicated to the proof of Theorem 13.11 

Theorem 3.1. With probability 1 — o(l), over the stochastic preferential attachment process on n 
nodes, algorithm TraverseToTheRoot returns a set of size 0(log 4 (n)). 

Our proof will make use of an alter native specification of the pre f erential atta c hmen t process, 
which is now standard in the literature Bollobas and Riordanl 2004], Doerr et al. 2011 ]. We will 



now describe this model briefly. Sample mn pairs (xij,yij) independently and uniformly from 
[0,1] x [0,1] with Xij < yij for i £ [n] and j G [m]. We relabel the variables such that g/jj is 
increasing in lexicographic order of indices. We then set Wo = and Wi = yi <m for i £ [n]. We 
define Wi = Wi—Wi—i for all i € [n]. We then generate our random graph by connecting each node i 
to m nodes pi(i), ■ ■ ■ ,p m (i), where each pk(i) is a node chosen randomly with P[p&(i) = j] = Wj/Wi 
for all j < i. We refer to the nodes pk(i) as the parents of i. 

Bollobas and Riordan showed that the above random graph process is equivalent to the prefer- 
ential attachment process. They also show the following useful properties of this alternative model. 
Set s = 1601og(n)(loglog(n)) 2 and si = ^ Let I t = [2* + 1, 2 m ]. Define constants = 1/4 
and C = 30. 



Lemma 3.2 (jBollobas and Riordanl 20041 ]). Let m > 2 be fixed. Using the definitions above, each 



of the following events holds with probability 1 — o(l): 
. E 1 = {\W i -Jl\<^Jlfors Q <i<n}. 

• E-2 = {It contains at most (3\It\ nodes i with W{ < ^J^- for log(so) < t < log(si)}. 

• E 3 = (wi > ; 4 7- }. 

• E± = {wi > - — rsTTi r= f or a tt i < s o}- 

L log (n)vn 

• E$ = {wi < lo ^^ > for sq < i < n} 



As has been observed elsewhere iDoerr et al.1 [201 1| . this process differs slightly from the preferential attachment 
process in that it tends to generate more self-loops. However, it is easily verified that all proofs in this section continue 
to hold if the probability of self-loops is reduced. 



s 



Note that we modified these events slightly for our purposes: event E2 uses different constants 
/3 and £, and in event E4 we provide a bound on Wi for all i < sn rather than i < n 1 / 5 . Fi nally 



event E§ is a minor variation on the corresponding event fr om [B ollobas and Riordanl 20041 ] . We 



our 



provide additional details on how to modify the proof from Bollobas and Riordan ~ j2004l | for 
purposes in Appendix IA.1I 

Given Lemma 13.21 we can think of the Wj's as arbitrary fixed values that satisfy events 
Ei, . . . ,Es, rather than as random variables. Lemma 13.21 implies that, if we can prove Theo- 
rem [3J] for random graphs corresponding to all such sequences of Wj's, then it will also hold for 
preferential attachment graphs. 

We now provide some intuition into our proof of Theorem 13.11 We would like to show that 
the algorithm queries nodes of progressively higher degrees over time. However, if the algorithm 
queries a node % of degree d, there is no guarantee that subsequent nodes queried will have degree 
greater than d. Suppose, however, that there were a path from i to the root consisting entirely of 
nodes with degree at least d. In this case, the algorithm will only ever traverse nodes of degree at 
least d from that point onward. One might therefore hope that the algorithm finds nodes that lie 
on such "good" paths for ever higher values of d, representing progress toward the root. 

Motivated by this intuition, we will study the probability that any given node i lies on a path 
to the root consisting of only high-degree nodes (i.e. not much less than the degree of i). We will 
argue that many nodes in the network lie on such paths. We prove this in two steps: first, we show 
that for any given node i and parent Pk{i), Pk{i) will have high degree relative to i with probability 
greater than 1/2 (Lemma 13.4)) . Second, since each node i has at least two parents, we use the 
theory of supercritical branching processes to argue that, with constant probability for each node 
i, there exists a path to a node close to the root following links to such "good" parents (Lemma 

EH). 

This approach is complicated by the fact that, for two nodes i and j, the existence of a high- 
degree-node path from i to the root is not independent of the existence of such a path from j to the 
root. To conclude that these paths occur "often" in the network, we must modify our argument to 
focus on independent events. We will require that once we look at a node I in an attempt to find 
a path to the root, we never again use node £ in our analysis. We therefore assume in our proofs 
that there is a set of "forbidden" nodes (set T, below) that cannot be used. We must ensure that 
these forbidden nodes are few enough that they do not interfere with our ability to find paths to 
the root. 

We will now proceed with the details of the proof. The proofs of many of the technical lemmas 
below have been deferred to Appendix [Aj We start with a definition. 

Definition 4 (Typical node). A node i is typical if either w-i > or i < sq. 

Note that event Ei implies that each interval It, log(so) < t < log(si) contains a large number 
of typical nodes. The lemma below encapsulates concentration bounds on the degrees of nodes in 
the network. 

Lemma 3.3. The following events hold with probability 1 — o(l): 

• Eq = {Vi > so : deg(i) < 6mlog(n)y/j}. 

• Ej = {Vso < i < s\ that is typical : deg(i) > ^^/j}- 

.E 8 = {Vi<s : deg(*)>^}. 
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We will next show that, for any set T that does not contain too many nodes from any interval 
It, and any given parent of a node i, with probability greater than 1/2 the parent will be typical, 
not in r, and not in the same interval as i. 

Definition 5 (Sparse set). We will say that a subset of nodes T C [n] is sparse if \T D It\ < 
\It\/ log log(n) for all logso < t < logsi. That is, T does not contain more than a 1/loglogn 
fraction of the nodes in any interval It contained in [so,si]. 

Lemma 3.4. Fix any sparse set T. Then for each i, sq < i < s±, and each k S [m], the following 
statements are all true with probability at least 8/15 : Pk(i) I\ Pk{i) < i>/2 } and pk(i) is typical. 

We now wish to argue that, for any given node i and sparse set T, it is likely that there is a 
short path from i to vertex 1 consisting entirely of typical nodes that do not lie in V. Our argument 
is via a coupling with a supercritical branching process. Consider growing a subtree, starting at 
node i, by adding to the subtree any parent of i that satisfies the conditions of Lemma 13,41 and 
then recursively growing the tree in the same way from any parents that were added (if any). Since 
each node has at least two parents, and each satisfies the conditions of Lemma [3.41 with probability 
greater than 1/2, this growth process is supercritical and should survive with constant probability 
(within the range of nodes for which Lemma 13.41 applies) . We should therefore expect that, with 
constant probability, such a subtree would contain at least one node j < so- 

Our next lemma, Lemma 13.51 will make this intuition precise. First, let us formally define the 
subtree structure alluded to above. Fix a sparse set T and a node i, sq < i < s±. We will define 
a set of nodes H-p(i) that corresponds to the subtree described above. We will define Hr(i) to be 
the union of a sequence of sets Hq, Hi, . . . , which we define recursively as follows. First, Hq = {i}. 
Then, for each £ > 1, Hi will be a subset of all the parents of the nodes in Hi_\. For each j S H^_\ 
and k G [m], we will add Pk(j) to He if and only if the following conditions hold: 

1. pk(j) is typical, 

2. p k (j) F, 

3. Pk(j) < j/2, 

4- Pk{j) H r for all r < £, and 

5. For the interval It containing Pk(j), \h H (Hq U . . . U Hi)\ < lOloglogn. 

Let us explain these five conditions briefly. Conditions 1-3 are precisely the conditions of Lemma 
13.41 Condition 4 is that Pk{j) has not already been added to the subtree; we add this condition 
so that the set of parents of any two nodes in the subtree are independent. Finally, condition 5 is 
that the subtree cannot contain more than lOloglogn nodes from any given interval It- We will 
use this condition to later argue that V remains sparse if we add all the elements of -ffr(^) to T. 

We finally define Hr(i) = Hq U Hi U . . . . We are now ready to state our next probabilistic 
lemma, which roughly states that any node i is contained on a short path to the root consisting of 
only typical nodes with probability at least 3/4. 

Lemma 3.5. Fix any sparse set T. Then for each node i with sq < i < s\, the probability that 
Ht(i) contains a node j < sq is at least 1/5. 



10 



Proof. Fix r and i, and write H = H-p{i). Let C = [so], the set of all nodes with index sq or less. 
We will show that the probability that H n C = is at most 4/5. 

Let t be such that i € i^. We will say that H saturates a given interval It if |-ffnit| = lOloglogn. 
(Note that we must have \H fl It\ < lOloglogn, from the definition of H). Let us first consider 
the probability that H n C = and -ff does not saturate any intervals. Since H does not saturate 
any intervals, and since the set H U T is itself a sparse set, then for each node j € H and A: 6 [m] 
the parent Pfc(j) w ih be added to H precisely if the conditions of Lemma 13.41 hold, which occurs 
with probability at least 8/15. We can therefore couple the growth of the subtree H within the 
range [sq, i] with the growth of a branching process in which each node spawns up to two children, 
each with probability at least 8/15. In this coupling, the event H n C = implies the event that 
this branching process generates only finitely many nodes. Write p for the probability that the 
branching process generates infinitely many nodes. Then p = j^p + (1 — ^p)j^P, from which we 
obtain p = We therefore have W[H nC = 0]<l — »=|| conditional on H not saturating any 
intervals. 

Next consider the probability that H n C = given that H does saturate some interval. In this 
case, there is some smallest t such that It is saturated by H. Then, given that H saturates It but 
no interval If for if < t, then we can again couple the growth of subtree H from interval It onward 
with 10 log log n instances of the branching process described above, each one starting at a different 
node in H D It- In this case, the probability that H fl C = is bounded by the probability that 
each of these 10 log log n copies of the branching process all generate only finitely many children. 
This probability is at most (49/64) 10 loglogri = o(^-r). Thus, taking the union bound over all 
possibilities for the value of t (of which there are at most logn), the probability that H n C = 
given that H saturates some interval is at most o(log nj log 2 (n)) = o(l). 

Combining these two cases, we see that ¥[H fl C = 0] < 49/64 + o(l) < 4/5. ■ 

Lemma 13.51 implies the following corollary, which we will use in our analysis of the algorithm 
TraverseToTheRoot. First a definition. 

Definition 6 (Good paths). For any given node Sq < % < s\, we say that i has a good path if 
there is a path from i to some node j < sq consisting entirely of nodes with degree at least S 



Lemma 3.6. Choose any set T of at most 16 logn nodes from [so, s i]- Then for each node i £T, 
i has a good path with probability at least 1/5, independently for each i £ T. 

Roughly speaking, Lemma 13.61 states that for any sufficiently small set of nodes, each node i in 
that set will have a good probability of being on a path to some j < so consisting entirely of nodes 
with degree not much smaller than the degree of i. In particular, we will apply Lemma 13.61 to the 
set of nodes queried by TraverseToTheRoot to argue that significant progress toward the root is 
made after every sequence of polylogarithmically many steps. 

We are now ready to proceed with the proof of Theorem 13.11 

of Theorem \3.1\ Our analysis will consist of three steps, in which we consider three phases of the 
algorithm. The first phase consists of all steps up until the first time TraverseToTheRoot traverses 
a node i < s\ with a good path. The second phase then lasts until the first time the algorithm 
queries a node i < sq. Finally, the third phase ends when the algorithm traverses node 1. We will 
show that each of these phases lasts at most 0(log 4 (n)) steps. 
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We note that we will make use of Lemma 13.61 in our analysis by way of considering whether 
certain nodes have good paths. We will check at most 16 log n nodes in this manner, and hence the 
conditions of Lemma 13.61 will be satisfied. 

Analysis of phase 1 Phase 1 begins with the initial node u, and ends when the algorithm 
traverses a node i < si with a good path. We divide phase 1 into a number of iterations. Iteration 
zero starts at node u. Define iteration t as the first time, after iteration t — 1, that the algorithm 
queries a node i < s±. 

Each new node i considered in iteration t will have i < s\ with probability at least VF^/l > 
2 13 log n ' re g ar dless of the previous nodes traversed. By the multiplicative Chernoff bound (jG.ip . 
with probability of at least 1 — 1/n 2 , after at most 51og 2 (n) steps such a node i < s\ would be found. 
By Lemma 13.61 we know that node i has a good path with probability at least 1/5 independent of 
all nodes traversed so far. 

By the multiplicative Chernoff bound (|G.1|) . we conclude that after at most 101og(n) iterations, 
and total time of 0(log 3 (n)), the algorithm traverses a node that has both i < s\ and a good path, 
with probability at least 1 — lo ^ n ^ — ^. 

We note that the number of invocations of Lemma [3.61 made during the analysis of this phase is 
at most 21ogn with high probability, and hence the cardinality restriction of Lemma 13. 61 is satisfied. 

Analysis of phase 2 Phase 2 begins once the algorithm has traversed some node i < s\ with a 

good path, and ends when the algorithm traverses a node j < sq. We split phase 2 into a number 

of epochs. For each log so < t < logsi, we define epoch t to consist of all steps of the algorithm 

during which some node i £ It with a good path has been traversed, but no node in any In for 

£ < t with a good path has been traversed. Define random variable It to be the length of epoch t. 

Note phase 2 ends precisely when epoch log sq ends. Further, the total number of steps in phase 2 
is y-logsi y 
lb Z^t=logs *" 

Fix some log sq < t < log si and consider Yj. Suppose the algorithm is in epoch t, and let i £ It 
be the node with a good path that has been traversed by the algorithm. Then, from the definition 
of a good path and event Ej, i has a parent j G Ii for some t < t with deg(j) > f^i/j- This node 
j is a valid choice to be traversed by the algorithm, so any node queried before j must have degree 
at least ^ yj- Moreover, traversing node j would end epoch t, so every step in epoch t traverses a 
node with degree at least By event Eq, any such node i satisfies i < zi log 2 (n) for constant 

z = (4£) 2 . But we now note that, for any node I < zilog 2 (n) traversed by the algorithm, the 
probability that I has a parent@ r < i/2 is at least -yjj- > 4 ^ ^ - . Moreover, if i has such a parent 
r, Lemma 13.61 implies that r has a good path with probability at least 1/5. Thus each step of the 
algorithm results in the end of epoch t with probability at least 2Q ^ | - . We conclude that Y t is 
stochastically dominated by a geometric random variable with mean 20£logn. Also, the number 
of invocations of Lemma 13.61 made during epoch t is dominated by a geometric random variable 
with mean 5. 

We conclude that X^t3og so ^* ^ s dominated by the sum of at most log n geometric random 
variables, each with mean 20Clogn = 600 log n. Concentration bounds for geometric random 
variables (Lemma IG.3|) now imply that, with high probability, this sum is at most 2 10 log 2 n. We 

4 Note that even if the algorithm queried node £ via its connection to one of its parents, it will still have at least 
one other parent that is independent of prior nodes queried by the algorithm since m > 2. 
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conclude that phase 2 ends after at most 2 10 log 2 n steps with high probability. Similarly, the total 
number of invocations of Lemma 13.61 made during the analysis of this phase is at most 6 log n with 
high probability, again by Lemma IG. 31 



Analysis of phase 3 We turn to analyze the time it takes from the first time the algorithm 
encountered a node of i < sq until node 1 is found. We st art by noting that the induced graph on 
the first so nodes is connected with high probability (see iDoerr et al.l 20111 ] . corollary 5.15). We 
note that by Lemma 13.31 every node j < so has degree at least d = 5 \ "gi^ n ) ■ As there is a path 
from i to node 1 where all nodes have degree at least d, the algorithm, as it follows the highest 
neighbor of its current set S, will reach node 1 before it had traversed any node of degree less than 
d. We can therefore assume that the algorithm only traverses nodes of degree greater than d. 
By Lemma 13.31 with high probability, each node j > sq has deg(j) < 6mlog(_7')y^, and 

therefore any node j with degree greater than d must satisfy j < (60() 2 log 5,8 (n). For any such 

11 (60C)log 2 ' 9 (n) 



node, Ei implies that Wj < - — ' — — . Thus, for each such j, the probability that j is connected 
to the root is w\/Wj > n , , by event E 3 . Chernoff bounds (Lemma IG.ip then imply that 

Z log [TLj 

such an event will occur with high probability after at most 0(log 4 (n)) steps. Thus, with high 
probability, phase 3 will end after at most sq + 0(log 4 (ra)) = 0(log 4 (n)) steps. ■ 



4 Applications of Fast Traversal to the Root 

In the previous section we presented an algorithm, TraverseToTheRoot, that connects any node u 
to node 1, in at most 0(log 4 (n)) time with high probability. We now turn to discuss applications 
of this result to other algorithmic problems. 



4.1 s-t connectivity 

In the s-t connectivity (or shortest path) problem we are given two nodes s, t in an undirected 
graph G(V,E) and must find a minimally connected subgraph containing s and t. We show that, 
for preferential attachment graphs, a minimally connected subgraph can be approximated well using 
only local information. 

Theorem 4.1. Let G be a preferential attachment graph on n nodes, and let s and t be two distinct 
nodes in it. Then, with probability 1 — o(l) over the probability space generating the preferential 
attachment graph, algorithm s-t-Connect, a 1-local algorithm, returns a connected subgraph of size 
at most 0(log 4 (n)) that connects s and t. 

Proof. From theorem l3-H with probability 1 — o(l), TravesetToTheRoot(G, s) returns a path from s 
to node 1 in time 0(log 4 (?i)). Similarly, with probability 1 — o(l), TraverseToTheRoot(G, t) returns 
a connected path from s to node 1 in time 0(log 4 (n)). Concatenating the two paths at node 1 is 
a path of length 0(log 4 (n)) from s to t. ■ 

Given Theoren 14.11 one may wonder whether it could be extended to arbitrary graphs. In 
Appendix |B] we show that, in general, local information algorithms cannot achieve sublinear ap- 
proximations even for highly connected graphs. 
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Algorithm 2 s-t-Connect 
1: Call TraverseToTheRoot(G, s), where the algorithm starts from node s in the graph G, and let 

Pi be the list of nodes returned. 
2: Call TraverseToTheRoot^, t), where the algorithm starts from node t in the graph G, and let 

Pi be the list of nodes returned. 
3: return the concatenation of Pi with P^. 



4.2 Finding high degree nodes 

A natural question on graphs is to find a node with degree at least some fixed poly-logarithmic 
factor of the maximum degree. As we now show, the algorithm TraverseToTheRoot obtains a 
polylogarithmic approximation to this problem. 

Theorem 4.2. Let G be a preferential attachment graph on n nodes. Then, with probability 1 — o(l), 
algorithm TraverseToTheRoot will return a node of degree at least log ^ n ^ of the maximum degree 

in the graph, in time 0(log 4 (n)). 

Proof. TraveseToTheRoot ends when node 1 is found. In Appendix [C] we prove that, with proba- 



bility 1 — o(l), node 1 has degree at least . However, from iBollobasI |2003l |. with probability 



log(n) - 

1 — o(l), the maximum degree is less than my / nlog(n). As TraveseToTheRoot runs, with high prob- 
ability, in 0(log 4 (n)) steps we conclude that a node of degree at least log l( n ) times the maximum 

degree in the graph is found in time O (log 4 (ri)). ■ 

We note that for any fixed k, algorithm TraveseToTheRoot can be extended to return, with 
high probability, in poly- logarithmic time in n, a set of k nodes, such that the i'th node in the list 
has degree at least 17( log ^ n ) ) of the i'th largest degree in the network. This can be done by letting 
the algorithm stop only after covering all nodes with index smaller than sq which includes, with 
high probability, the k highest degree nodes. 

We also note that one cannot hope to extend the method given above to general graphs. This 
is discussed at the end of the proof of theorem ID. 21 

Finally, in Appendix ID! we consider the optimization problem in which the goal is to maximize 
the ratio |Z)(>S l )|/|iS'|, which we call the maximum "gain per cost" coverage problem. For this 
problem we note that the TraverseToTheRoot algorithm obtains a polylogarithmic approximation 
in 0(log 4 (n)) queries. 

5 Minimum Dominating Set on Arbitrary Networks 

We now consider the problem of finding a dominating set S of minimal size for a given arbitrary 
graph G. Even without any information restrictions on the structure of the network, it is known 
to be hard to approximate the Minimum Dominating Set Problem to within a factor better than 
H(A) in polynomial time, via a reduction from the set cover problem, where H{n) ~ ln(ra) + 7 is 
the nth harmonic number. In this section we explore how much network structure must be made 
visible to a local algo rithm in order for it to be possible to match this lower bound. 

Guha and Khuller Guha and Khullei 19981 ] design an 0(-P(A))-approximate algorithm for the 



minimum dominating set problem, which can be interpreted in our framework as a 2 + -local al- 
gorithm. Their algorithm repeatedly selects a node in order to greedily maximize the number of 
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Algorithm 3 AlternateRandom 

1: Select an arbitrary node u from the graph and initialize S = {u}. 

2: while D(S) ± V do 

3: Choose x S argmax i;6 jv(s){|^V'(u)\Z?(5)|} and add x to S. 

4: if N{x)\S ^ then 

5: Choose y € N(x)\S uniformly at random and add y to S. 

6: end if 

7: end while 

8: return S. 



dominated nodes, where they consider only nodes within distance 2 of a previously selected node. 
As we show below, the ability to observe network structure up to distance 2 is unnecessary if we 
allow the use of randomness: we will construct a randomized 0(H(A)) approximation algorithm 
that is l + -local. 

5.1 A l + -local Algorithm 

In this section we present a l + -local randomized 0(-£f (A))-approximation algorithm for the mini- 
mum dominating set problem. Our algorithm obtains this approximation factor both in expectation 
and with high probability in the size of the optimal solution. We note that our algorithm actually 
generates a connected dominating set, so it can also be seen as an 0(H(A)) approximation to the 
connected dominating set problem. 

Roughly speaking, our approach is to greedily grow a subtree of the network, repeatedly adding 
vertices that maximize the number of dominated nodes. Such a greedy algorithm is l + -local, as 
this is the amount of visibility required to determine how much a given node will add to the number 
of dominated vertices. Unfortunately, this greedy approach does not yield a good approximation; 
it is possible for the algorithm to waste significant effort covering a large set of nodes that are 
all connected to a single vertex just beyond the algorithm's visibility. To address this issue, we 
introduce randomness into the algorithm: after each greedy addition of a node x, we will also query 
a random neighbor of x. The algorithm is listed as Algorithm [3l 

We now show that AlternateRandom obtains an 0(H(A)) approximation, both in expectation 
and with high probability in the size of the optimal solution. In what follows, OVT will denote 
the size of the optimal dominating set in an inplicit input graph. 

Theorem 5.1. Algorithm AlternateRandom is 1 + -local and returns a dominating set S where 
E[\S\] < 2(1 + H(A))OVT + 1 and F[\S\ > 2(2 + H{A))OVT] < e~ OVT . 

Proof. Correctness follows from line 2 of the algorithm. To show that it is l + -local, it is enough 
to show that line 3 can be implemented by a l + -local algorithm. This follows because, for any 
v € N(S), \N(v)\D(S)\ is precisely equal to the degree of v minus the number of edges between v 
and other nodes in D(S). 

We will next bound the expected size of S via the following charging scheme. Whenever a node 
x is added to S on line 4, we place a charge of l/\N(x)\D(S)\ on each node in N(x)\D(S). Note 
that these charges sum to 1, so the total charge over all nodes in G will increases by 1 on each 
invocation of line 4. We will show that the total charge placed during the execution of the algorithm 
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is at most (1 + H(A))OVT in expectation. This will imply that E[(|5| - l)/2] < (1 + H(A))OVT 
as required. 

Let T be an optimal dominating set. We will partition the nodes of G as follows: for each % G T, 
choose a set Si C D({i}) containing i such that the sets Si form a partition of G. Choose some 
i £ T and consider the set Si- We denote by a "step" any execution of line 4 in which charge is 
placed on a node in Si. We divide these steps into two phases: phase 1 consists of steps that occur 
while Si n S = 0, and phase 2 is all other steps. Note that since we never remove nodes from S, 
phase 1 occurs completely before phase 2. 

We first bound the total charge placed on nodes in Si in phase 1. In each step, some number 
k of nodes from Si are each given some charge 1/z. This occurs when \N(x)\D(S)\ = z and 
(N(x)\D(S)) H Si = k. In this case, if phase 1 has not ended as a result of this step, there is a k/z 
probability that a node in Si is selected on the subsequent line 6 of the algorithm, which would 
end phase 1. We conclude that if the total charge added to nodes in Si on some step is p S [0, 1], 
phase 1 ends for set Si with probability at least p. The following probabilistic lemma now implies 
that the expected sum of charges in phase 1 is at most 1. 

Lemma 5.2. For 1 < i < n, let Xj be a Bernoulli random variable with expected value pi E [0, 1]. 
Let T be the random variable denoting the smallest i such that = 1 (or n if Xi = for all i). 



Then E 



T 



Ei=lPi 
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Proof. We proceed by induction on n. The case n = 1 is trivial. For n > 1, 
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where the inequality follows from the inductive hypothesis applied to X2, ■ ■ ■ ,X n . 



Next consider the sum of charges added to nodes in Si in phase 2. During phase 2, vertex i 
is eligible to be added to S in step 4. Write Uj = \Si\D(S)\ for the number of nodes of Si not 
dominated on step j of phase 2. Then, on each step j, uj — Uj+i nodes in Si are added to D(S), and 
at most Uj total nodes in G are added to D(S) (since this many would be added if i were chosen, 
and each choice is made greedily). Thus the total charge added on step j is at most Uj Uj+1 . 

Uj 

Suppose Uk = (which must be true for some k < A). The sum of charges over all of phase 2 is 
therefore at most 



k-l 

E 



k-l 



U i 



<^2-<H(k)<H(A). 



=1 3 



So the expected sum of charges over both phases is at most 1 + H(A) as required. 
The proof that F[\S\ > 2(2 + H{A))OVT] < e - OVT appears in AppendixE 



5.2 A lower bound for 1-local algorithms 

Give that we can construct a l + -local algorithm with approximation factor nearly matching the 
lower bound for arbitrary polytime algorithms, a natural question is whether a 1-local algorithm can 
do just as well. In this section we show that the answer is no: there exist networks for which there 
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is a dominating set of constant size, but a 1-local algorithm requires Q(n) queries in expectation 
to find any dominating set. 



Theorem 5.3. For any randomized 1-local algorithm A for the min dominating set problem, there 
exists an input instance G for which K[|5'|] = £l{n)OVT , where S denotes the output generated by 
A on input G. 

Proof. We consider a distribution over input graphs G = (V, E) of size n, described by the following 
construction process. Choose n — 2 nodes uniformly at random from V and form a clique on these 
nodes. Choose an edge at random from this clique, say (u,v), and remove that edge from the 
graph. Finally, let the remainin g two nod es be u' and v 1 , and add edges (u,u') and (v,v') to E. 



By the Yao's minmax principle lYad [19771 ] . it suffices to consider the expected performance of a 
deterministic 1-local algorithm on inputs drawn from this distribution. 

Note that each such graph has a dominating set of size 2, namely {u, v}. Moreover, any 
dominating set of G must contain at least one node in C = {u, v,u',v'}, and hence a 1-local 
algorithm must query a node in C. However, if no nodes in C have been queried, then nodes u 
and v are indistinguishable from other visible unqueried nodes (as they all have degree n — 1). 
Thus, until the algorithm queries a node in C, any operation is equivalent to querying an arbitrary 
unqueried node from V\{u' ,v'}. With high probability, Q(n) such queries will be executed before 
a node in C is selected. ■ 



6 Partial Coverage Problems 

We next study problems in which the goal is not necessarily to cover all nodes in the network, but 
rather to dominate only those sections of the network that can be covered efficiently. We consider 
two problems in this domain: the partial dominating set problem, in which the goal is to dominate a 
constant fraction of the network with as few nodes as possible; and the neighbor collecting problem, 
in which the goal is to minimize the number of vertices left uncovered plus the cost of the output 
set. Many of the proofs in this section have been deferred to Appendix IF! 

6.1 Partial Dominating Set 

In the partial coverage problem we are given a parameter p € (0, 1]. The goal is to find the smallest 
set S such that |.D(,S)| > pn. The case p = 1 is the minimum dominating set problem. We begin 
with a negative result: for any constant k and any fc-local algorithm, there are graphs for which the 
optimal solution has constant size, but with high probability £l{\/n) queries are required to find 
any /j-partial dominating set. Our example will apply to p = 1/2, but it can be extended easily to 
any constant p S (0, 1). 

Theorem 6.1. For any randomized k-local algorithm A for the partial dominating set problem 
with p = 1/2, there exists an input instance G with optimal partial dominating set OVT for which 
EflS'l] = 0(y / n) • \OVT\, where S denotes the output generated by A on input G. 

Motivated by this lower bound, we consider a setting in which the algorithm need not cover 
precisely pn vertices, but is instead allowed a small margin for error. We consider a bicriterion 
result: given some e > 0, we will compare the performance of an algorithm that covers pn nodes 
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Algorithm 4 AlternateRandomAndJump 

1: Initialize S = 0. 

2: while D(S) ± V do 

3: Choose a node u uniformly at random from the graph and add u to 5. 

4: Choose x G argmax^ 6 7v(5){|iV(f )\D(5)|} and add x to 5. 

5: if N(x)\S / then 

6: Choose y G N(x)\S uniformly at random and add y to S. 

7: end if 

8: end while 

9: return S. 



with the optimal solution that covers p(l + e)n nodes. We will assume that parameters are chosen 
so that p(l + e) < 1. 

Our approach to this problem will be similar to Algorithm [3] in Section Our algorithm, listed 
as Algorithm [H will grow a subgraph by alternating between greedily maximizing the number of 
dominated nodes and querying a random neighbor of a previously selected node. However, we 
note that such an algorithm may begin growing its tree in a section of the network that cannot be 
covered efficiently, leading to poor performance. To deal with this issue, Algorithm [3] also regularly 
adds a randomly chosen vertex from the graph, potentially initiating the growth of a new subtree. 
Since we compare against an optimal solution that must cover en additional nodes, such a random 
jump always has a non-negligible chance of landing on a previously unexplored node covered by 
the optimal solution. 

Theorem 6.2. Given any p G (0,1), e G (0,/cr 1 - 1), and set of nodes OVT with \D(OVT)\ > 
p{l + e)n, Algorithm^ returns a set S of nodes with \D(S)\ > pn andW\S\\ < 3\OPT\(pe)~ 1 H(A). 

6.2 The Neighbor Collecting Problem 

We next consider the objective of minimizing the total cost of the selected nodes plus the number 
of nodes left uncovered. Formally, the goal is to choose a set S of G that minimizes f(S) = 
c\S\ + |y\D(S')| for a given parameter c > 0. This "Neighbor-Collecting" problem is motivated by 
the Prize-Collecting Steiner Tree problem. 

We first note that when c < 1 the problem reduces to the minimum dominating set problem: it 
is always worthwhile to cover all nodes. Assuming c > 1, the l + -local algorithm for the minimum 
dominating set problem achieves an 0(cH(A)) approximation. 

Theorem 6.3. For any c > 1 and set OVT minimizing f{OVT), algorithm AlternateRandom 
returns a set S for which E[/(5)] < 2c(l + H{A))f{OVT). 

Proof We have f{OVT) = \V-D{OVT)\+c\OVT\ and f(OVTU{V-D(OVT)}) = c\OVTU{V- 
D(OVT)}\ = c\OVT\+c\ V— D{OVT)\ > c\T*\ where T* is a minimum dominating set of the graph. 
Next, we know from Theorem O that \T*\ > (2(1 + H(A))y 1 E[\S\] = (2(1 + {{(A)))- 1 ^ 1 f(S). 
Finally, f(OVT) = \V - D(OTT)\ + c\OTT\ so c\OTT\ + c\V - D{OTT)\ < cf{OTT). We 
conclude that f(S) < 2(1 + H{A))cf{OVT). ■ 

Since the neighbor-collecting problem contains the minimum dominating set problem as a special 
case (i.e. when c = 1), we cannot hope to avoid the dependency on H(A) in the approximation 
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factor in Theorem 16.31 As we next show, the dependence on c in the approximation factor we 
obtain in Theorem 16.31 is also unavoidable. 

Theorem 6.4. For any randomized k-local algorithm A for the neighbor- collecting problem, there 
exists an input instance G for which E[/(S")] = r2(max{c, log A}) • f(OVT), where S denotes the 
output generated by A on input G. 

Finally, one cannot move from l + -local algorithms to 1-local algorithms without significant loss: 
every 1-local algorithm has a polynomial approximation factor. 

Theorem 6.5. For any randomized 1-local algorithm A for the neighbor- collecting problem, there 
exists an input instance G for which E[/(5)] = 0(y / n/c) • f(OVT), where S denotes the output 
generated by A on input G. 

7 Conclusions 

We presented a model of computation in which an algorithm is constrained in the information that 
it has about the input structure, which is revealed over time as expansive exploration decisions are 
made. This algorithmic framework is motivated by external users in a network who cannot freely 
make arbitrary queries to the network structure. 

Our motivation lies in determining whether and how an individual in a network can efficiently 
solve optimization problems in a local manner, especially when the ability to do so is driven by 
the properties of the network. Our results suggest that properties inherent in the structure of 
social and technological networks may be crucial in obtaining strong performance bounds in this 
local information framework. A possible avenue of future work is to analyze datasets for social and 
web graphs to determine whether local, greedy search algorithms actually do perform as well as 
predicted at such tasks as finding short paths between individuals and finding high-degree vertices. 

Another implication is that the designer of a network interface, such as an online social network 
platform, may gain from considering the power and limitations that come with the design choice 
of how much network topology to reveal to individual users. On one hand, revealing too little 
information may restrict natural social processes that users expect to be able to perform, such 
as searching for potential connections on a network of professional contacts. On the other hand, 
revealing too much information may raise privacy concerns, or enable unwanted deviant behaviour 
such as automated advertising systems searching for central individuals to target. Our results 
suggest that minor changes to the structural information made available to a user can have a large 
impact in determining the class of network optimization problems that can be reasonably solved. 
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APPENDIX 

A Omitted proofs from Section [3] 
A.l Proof of Lemma 13.21 

In this sectio n we provide details for the proof of Lemma 13.21 This result follows that of Bollobas 



and Riordan iBollobas and Riordanl 20041 ] quite closely; we present the differences only briefly for 
completeness. 

The proof that events E\, E2, and E3 hold with high probability follows entirely without change, 
except for the modification of certain constants. We therefore omit the details here. 

We next show that event E4 holds with high probability, by showing that PrfE^ n E±] = o(l). 

Suppose that E% n E x holds and let S = t i. M j n) ^ - As Ei holds we have W So < lllo s lo sW lo g W . 

As E$ does not hold there exists some interval [x,x + 5] with < x < 11 log log ^V log( - n ) . that 
contains two of the Wi and hence two of the yij. Each such interval is contained in some interval 

J t = [tS,(t + 2)5] for < t < s~ l Uloglo ^ n jy iog[n) < 21og 2 ' 5 (n). The probability that some y id 

lands in such an interval is (At + 4)<5 2 , so the probability that at least two lie in Jt is at most 
m 2 n 2 (At + 4) 2 <5 4 /2 < 32m 2 / log 2 ' 6 (n). Thus 

21og 2 ' 5 (n) 

¥{E% n£i)< 32m 2 / log 2 ' 6 (?i) = o(l) 

t=o 

as required, given k > 2. 

We will next show that the event E§ holds with high probability. Recall that event E§ is 
{wi < —yfi^- f° r sq < i < n}. We will show that P(E^ n E\) = oil), which will imply that 
P(E , 5) = 1 — o(l) as required. 

Suppose that E§ n E\ holds. Then there is some i > sq is such that u>i > l °^n^ ■ Define 

6 = log ^ ■ it must therefore be that the interval (Wi—i , Wi—\ + 5} does not contain Wi, and hence 
contains at most m — 1 of the y^j. Since E\ holds, we must have W So > jq\J~^- We now define 
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a partition of [75^/7^, 1] into intervals Jt = [xt,xt+i) for t > 0, where we define xq = anc ^ 
rr/ = + log ( n ) for all t > 1, until x+ > 1. We note that there are at no more than mn intervals 
Jt in total. We also note that, since E\ holds, each interval (Wi-i, Wi-i + 5] contains at least m— 1 
intervals Jf, each satisfying xt > Wi— 1, one of which must contain no j/jj since E4 does not hold. 
For a given i satisfying > Wi_i, the number of g/jj in J t has a Bi(mn,pt) distribution with 

2 2^ log(n) 21og(n) 

Pt = x t+1 - xt < 2x t = . 

Xtnm nm 

The probability that no yi lies in this interval is thus 

(1 -Pt) mn < e~ mnpt < e - 2log{n) = oin" 1 ). 
Summing over the 0(n) values of t shows that Pt(E% n E±) = o(l), as required. 



A. 2 Proof of Lemma [3731 

We start by noting that deg(i) = Y^=i+i YlT=i ^~k,j where each of the Yjjs is an i.i.d Bernoulli 
random variable that gets the value of one with success probability of This follows from the 
fact the each new node j sends m edges backwards and the probability of each hitting node i is 
exactly From E\ and E$, 



log(n)-i 

By estimating the sum with an integral we get 



JL ( log(ra)4=\ JL / 
E(deg(i))< I m 9 /J 1 = J2 [ ml °s(n) 
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, . . 10m , , . Jn 
E(deg«) < -— log(n)^. 

From the multiplicative Chernoff bound (|G.1|) we conclude that with probability bigger than 1 — 
1/n 2 , deg(z) < 3m log(ra)y| for a given node i. By using the union bound, event E§ then holds 
with probability 1 — 1/n. 

To prove Ej holds with probability 1 — 1/n we note that from £4 we have that for a typical 
node Wi > — This implies, similarly to the first part of the proof, that 

, / \ \ 10m \fn 
E(deg W ) > — 

As Exp(deg(i)) > 16mlog(n) for any sq < i < s± (since s± = 2 25 io g ^ n ) ' we can i nv °ke the Chernoff 
bound (jG.lj) to get that Ej holds with probability bigger than 1 — 1/n 2 for a given node i. This 

follows by thinking of deg(i) as a sum of Bernoulli random variables Yij, where Yij succeeds with 

1 

probability C ^J" . By using the union bound, event E-j then holds with probability 1 — 1/n. 

The prove that E% holds with probability 1 — 1/n follows similarly to the proof for such a claim 
for E 7 , by using the property that Wi > log^^nj^/n ' 
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A. 3 Proof of Lemma [3.41 



We first recall the statement of the lemma. Fix any sparse set T. Then for each i, sq < i < si, 
and each k G [m], the following statements are all true with probability at least 8/15 : Pk(i) & T, 
Pk(i) < */2, and Pk{i) is typical. 

Fix i and k. For each of the three statements in the lemma, we will bound the probability of 
that statement being false. 

First, we will show that P[pjfe(«) not typical ] < 1/15. Note that, given that Pk(i) falls within an 
interval It, this probability is bounded by the total weight of atypical nodes in It divided by the 
total weight of If. Since each atypical node j has weight at most 1Q ^j^ and j > 2* for all j £ It, 
E4 implies that the total weight of atypical nodes in I t is at most 



0\I t \—— = P, 



10y/¥n 10y/n' 
Also, E\ implies that the total weight of It is 



w 2 ,»-w 2 ,<S(—^-—) 

2 2 - V n VlOO 100 J 



Since these bounds hold for all t, we conclude that 



Tn,r , s ,1 10/3 

P\Pk[i) not typical J < 



99\/2 - 101 



which will be at most 1/15 for j3 = 1/4. 

Next, we will show that P[pjfe(i) > i/2] < 3. Event E\ implies that 



P[p fc (i) > i/2] = 1 - W i/2 /Wi < 1 - TT ^= < - 



99 1 
101\/2 3' 



Finally, we will show that P[pfc(i) 6 T] < 1/15. Given that pk(i) falls within an interval It, this 
probability is bounded by the total weight of It PI T divided by the total weight of It- In this case, 



due to the assumed sparsity of V and E%, the former quantity is at most lid- r= < \ — 

y J °' 1 11 (log log n)V¥n — x/ " 



Also, as above, the total weight of It is at most y %(%V2 — -^f)- Since these bounds hold for all 
t, we conclude that F\pk(i) 6 T] < 99v ^ 101 which is at most 1/15. 

Taking the union bound over these three events, we have that the probability none of them 
occur is at least 8/15 as required. 



A. 4 Proof of Lemma 13.61 

Write T = {ti, . . . , We will apply Lemma [331 to each node U in sequence. First, for node t\, 
define T\ = 0. Lemma [3.51 with r = T\ implies that H-p 1 (ti) contains a node j < sq with probability 
at least 1/5. 

For each subsequent node ti, define = r^-i U H-^^ii — 1). We claim that this Ti is sparse. 
To see this, recall that each Hr(ij-i) contains at most 10 log log n nodes in each interval It, and 
Ti is the union of at most 16 log n such sets, so |Tj n It\ < 160 log(n) log log(re) for each t. Since 
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\It\ > so > 160 log (n) (log log(n)) 2 , we have that \T{ n It\ < |It|/log log(n) and hence I 1 , is sparse. 
Lemma 13.51 with T = Tj then implies that Hj' i (ti) contains a node j < sq with probability at least 
1/5. Moreover, this probability is independent of the events for nodes ti, ... ,U-i, since Hr^ti) is 
constrained to not depend on nodes in Tj, which contains all nodes that influenced the outcome for 
h, ■ ■ ■ , U-i. 

We conclude that, for each i, Ht^U) contains a node j < so with probability at least 1/5, 
independently for each ti. For any given i, in the case that this event occurs, H^fa) contains a 
path from ti to j consisting entirely of typical nodes, all of which are at most ti. Each of these 
nodes has weight at least ^^ t , n (since they are typical), and hence event Eq implies that each has 

degree at least i-r-7= = „ " , as required. 



B s-t connectivity in arbitrarily connected networks 

Lemma B.l. Let k, r, n > max{k,r} , be positive integers. Any r-local algorithm with success 
probability bigger than ^ for the s — t connectivity problem over the family of k-edge connected 
graphs on n nodes achieves an expected approximation, over its successful runs, worse than Q(-r^) 
on that graph family. 

Proof. We first focus our attention on proving the claim for r-local algorithms on k = 1 connected 
graphs. The proof will invoke the application of Yao's minmax principle for the performance of 
Monte Carlo randomized algorithms on a family of inputs Yaol 1977 ]. The lemma states that 



the expected cost of the optimal deterministic Monte Carlo algorithm with failure probability of 
e € [0, 1] on an arbitrary distribution over the family of inputs is a lower bound on the expected cost 
of the optimal randomized Monte Carlo algorithm with failure probability of | over that family of 
inputs. 

To use the lemma we will focus on Monte Carlo deterministic algorithms that have failure 
probability smaller than some small constant, say |, and analyze their performance on a uniformly 
at random chosen input from a family of inputs constructed below. Given n we construct the family 
of inputs. Each input is a graphs constructed as follows: we have two distinct nodes s and t. We 
define a broken path as path on 2r + 4 nodes where the 'middle' node has one of its edge removed, 
say from the part of the path connecting it to t. The graph will be made from - 2 2r ^ + ^ distinct 
broken paths from s to t together with one distinct connected path connecting s to t. The identity 
of the connected path would be chosen uniformly at random from the set of 2r+i P a ths. In total the 
family of inputs contains 2r+i members. As the algorithm is r-local, being at s or t it cannot see 
the middle node on a broken path so it cannot decide if a path is broken before traversing at least 
one node in it. A compelling property therefore holds: if the algorithm has not found the connected 
path after i queries then the algorithm learns nothing about the identity of the broken path except 
that it is not one of the paths it traversed so far. As the connected path is chosen uniformly at 
random from all paths, the probability that after \2~FTi Q uer i es t ne connected path is found is at 
most ^. Thus, conditioned on the algorithm being successful (an event having probability at least 
|), the expected cost of finding a path from s to t must be ^(^). Using Yao's principle applied 
to Monte Carlo algorithms the worst case expected cost of a randomized algorithm on at least 
one of the inputs would be at least However, on any of the inputs, an algorithm with full 

knowledge of the graph can find the connected path in any graph in the family in 2r + 4 queries. 
The approximation ratio of the r-local algorithm would therefore be worse than f2(^-). 
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It is not hard to generalize the construction to k connected graphs by replacing parts of each 
path in the construction by a complete graph on that nodes. For a detailed description see below. 

We create jofjEpn distinct paths, connecting s to t, each on k(2r + 4) nodes. We choose all but 
one of them to be broken. For a given path, the node at distance k(2r + 2) is chosen to be broken. 
In each broken path we form a clique between every consecutive k nodes, starting from s, up to 
the point the path is broken at. If we denote the path by p\ = s,p2, ■ ■ ■ ,P2r+4 = t The first clique 
contains nodes p\ = s,p2, ■ ■ ■ ,Pk and the last on pz_ k+1 ,pr_ k+2 , ■ ■ ■ ,pr. Similarly, we form cliques 
on the nodes on t side of the broken path on every k consecutive nodes, starting at node pr + 1. 

The graph then becomes fc-edge connected. We can now repeat the argument given above. ■ 



C Degree of the root in preferential attachment networks 

In this section we prove that, with high probability, the root node in a preferential attachment 
network has degree at least m^fnj log(ra). 

Lemma C.l. Consider a preferential attachment network in which each node generates m links. 
Then, with probability at least 1 — o(n _1 ), deg(l) > 

Proof. We will use the notation from Section [3j Recall that deg(l) = ^j=2 SfcLi where each 
of the Ykjs is an i.i.d Bernoulli random variable that gets the value of one with success probability 
^r-. From E\ and E$ in Lemma |3.2| we have 

A/\og(n)y/n\ / 40 1 

m- 



E(deg(l)) > £ rn ^l^p ] = £ 

j =s " V iov 

By estimating the sum with an integral we get 



A./Z / ~t V 9 log(ra)V7 
■ ?_So \ 10 



^ / , , . . 39m Jri 
E(deg(l)) > - 



log(n) ' 

From the multiplicative Chernoff bound (|G.1|) we conclude that with probability bigger than 1— 1/n, 
deg(l) > 

m iog(n)' aS rec l urre d- ■ 

D High "gain per cost" node coverage in PA networks 

We consider the following natural problem on graphs. Given that accessing a node comes with 
some fixed cost c one would like to find a set of nodes S such that the effective "gain per cost" 
is maximized, nam ely the size of the n odes covered D(S) (S and its neighbors) per the size of S 



is maximized (see iKhuller et al.l [19991 ] for an extended variant of the problem). If v is a node of 
maximum degree in the graph the solution is to choose such a node v. A potential approximation 
strategy would be to quickly find a node of high degree. The following corollary follows from 
theorem 14.21 

Corollary D.l. Let G be a preferential attachment graph on n nodes. Then, with probability 
1 — o(l), over the probability space generating the preferential attachment graph, algorithm Trav- 
eseToTheRoot, a 1-local information algorithm, returns a set of size at most 0(log 4 (?i)) contain- 
ing a node of maximum degree in the graph . In particular, TraveseToTheRoot achieves smaller 
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than 0(iog 8 (n)) approximation to the "gain per cost" coverage problem on preferential attachment 
graphs. 

Proof. With probability 1 — o(l), after 0(log 4 (n)) a node of degree at least ^ , is found, 

4 log yn) 

achieving a "gain per cost" of 5 \^j\ n ) • As * ne highest degree is at most - v /nlog(n) so is the 
maximum "gain per cost" and the result follows. ■ 

Given the positive result for preferential attachment graphs one may ponder whether it could be 
extended to work on general graphs. The following theorem would show that in general, algorithms 
that even r = n*-local information algorithms, cannot achieve a good approximation. 

Lemma D.2. Let k, r, n > max{k,r}, be positive integers. Any r-local algorithm with success 
probability bigger than 3 for the "gain per cost" problem over the family of k- edge connected graphs 

on n nodes achieves an expected approximation, over its successful runs, worse than on that 

graph family. 

In particular, for k = 1 and r = o{n±) the approximation ratio grows to infinity with the number 
of graph nodes n. 

Proof. We first focus our attention on proven the claim for 1-local algorithms on k = 1 connected 
graphs. The proof would follow similar lines to that of lemma IB. II Will invoke the app l ication 



of Yao minmax principle for the performance of Monte Carlo randomized algorithms lYaol 1977 ], 
For that we focus on analyzing the performance of deterministic Monte Carlo algorithms on a 
uniformly at random chosen input from a family of inputs. Given n we construct the family of 
inputs as follows: each input is a graph made from a complete binary tree on n — 1 — y/n nodes 
labeled 1, 2, . . . , n — 1 — y/n. In addition one leaf node s would be a hub for y/n new spoke nodes. 
We denote the subgraph on node s and its neighbors by H. Note that node s is the only node with 
degree bigger than three and so any algorithm that want to achieve a good "gain per cost" must 
find that node. 

Each input would correspond to a specific choice of assignment for node i. In total there are 
therefore (("-^^v 7 ")/ 2 ) inputs. Such algorithms know only the degrees of the nodes they already 
traversed. The input comes with a compelling property: if the algorithm has not found a node in 
the subgraph H after i queries then we learned nothing about the identity of s except that it is one 
of the leaf nodes not queried so far 0. Since s was chosen uniformly at random across all n ~ 1 ~V^ 

leaves, the probability that after ^ Jump and Crawl queries a node in H is found is less than |. 
To see than we note that with the many Jumps the probability of hitting H, for n large enough is 
smaller than ~ + The probability of hitting the leaf s between all leaf trees, given we don't hit 
the spokes of H is at most 1/2. By the union bound the total probability of finding s is at most 
2 + rjCe + TSo) ^ §• Thus, for any algorithm that is successful with probability, say 1 — g, the 
expected cost of finding node s is £l(y/n). 

By Yao's principle the expected cost of a randomized algorithm on one of the inputs would be 
at least Cl(y/n). However, an algorithm with full knowledge of the graph can find node s with at 
most 0(log(n)) queries on any of the inputs. We conclude that the approximation of any 0-local 
algorithm on one of the inputs would be O( lo ^^ ). 



5 the deterministic algorithm "knows" the distribution over inputs, namely knows that s is a leaf connected to a 
star subgraph 
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To generalize the problem to k connected graphs we replace the each edge (u, v) in the complete 
binary tree subgraph in the construction above by a distinct path from u to v of length rk. We 
then connect all the first k nodes on each of the new paths replacing edges in the original graph to 
form a clique between themselves, and do the same for any next consecutive blocks of k nodes on 
that path. 

The graph then becomes k edge connected. The total number of nodes becomes C = n + (n — 
l)(kr — 2). The same compelling property still hold: if the algorithm has not found a node in H 
after i queries then we learn nothing about the identity of s except it is not one of the node queried 
so far. The expected cost for finding node s would be at least As an algorithm with full 

information of the graph can find node s in at most log(C)r time, the result follows. 

We end by noting that as each node has degree at most three except node s the proof also 
provides a similar lower bound for finding a node who is at most a poly-logarithmic factor smaller 
than the maximum degree, a problem discussed in a previous section. ■ 



E Proof of Theorem 15.11 



We now complete the proof of Theorem O by showing that P[|5| > 2(2 + H(A))OVT] < e-° VT . 
We will use the same charging scheme; it suffices to show that the total charge placed, over all 
nodes in G, is at most (2 + H(A))OVT with probability at least 1 — e~ ^ . Note that our bound 
on the charges from phase 2 in the analysis of the expected size of | *S*| holds with probability 1. 
it is therefore sufficient to bound the probability that the sum, over all i, of the charges placed in 
phase 1 of Si is at most 20VT. 

For each node x added to S on line 4, consider the total number of nodes in N(x)\D{S) that 
lie in sets Si that are in phase 1. Suppose there are k such nodes, and that \N(x)\D(S)\ = z. Then 
the sum of charges attributed to phase 1 increases by k/z on this invocation of line 4. Also, the 
probability that any of these k nodes is added to S on the next execution of line 6 is at least k/z, 
and this would end phase 1 for at least one set Si. 

We conclude that, if the sum of charges for phase 1 increases by some p € [0,1], then with 
probability p at least one set Si leaves phase 1. Also, no more charges can be attributed to phase 
1 once all sets Si leave phase 1, and there are OVT such sets. The event that the sum of charges 
attributed to phase 1 is greater than 20VT is therefore dominated by the event that a sequence 
of Bernoulli random variables X±, . . . ,X n , each Xj having mean pi with YlPi > 20VT, has sum 
less than OVT. However, by the multiplicative Cher noff bound (lemma IG.1|) . this probability is 
at most 



Xi < OVT 



5> 



< 



< e 



as required. 



F Omitted proofs from Section [6] 
F. 1 Proof of Theorem ETTI 

Fix n and write r = Tl 2(k+i) • We define a distribution over input graphs on n nodes corresponding 
to the following construction process. Build two stars, one with n/2 — \fn leaves and one with sfn 
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leaves, where the nodes in these stars are chosen uniformly at random. Let v and u be the roots of 
these stars, respectively. Construct r paths, each of length k + 1, again with the nodes being chosen 
uniformly at random. Connect one endpoint of each path to a separate leaf of the star rooted at 
v. Choo se one of t hese r paths and connect its other endpoint to node u. By the Yao minmax 
principle lYad 19771 ] . it suffices to consider the expected performance of a deterministic algorithm 
on a graph chosen from this distribution. 

For any such graph, the optimal solution contains two nodes: the root of each star. We claim 
that any k- local algorithm performs at least y/n queries in expectation. First, if the algorithm does 
not return the root of the smaller star as part of its solution, then it must return at least 0(y/n) 
nodes and hence it must use £l(\/n) queries. On the other hand, suppose that the algorithm does 
return the root of the smaller star. Then it must have either traversed the root some node along 
the path connecting the centers of the stars, or else found a node in the smaller star via a random 
jump query. The latter takes Q(y/n) Jump queries, in expectation. For the former, note that an 
algorithm cannot distinguish the path connecting the two stars from any other path connected to 
node v, until after a vertex on one of the two paths has been queried. It would therefore take 
f2(r) = Q,(n/k) queries in expectation to traverse one of the nodes on the path between the two 
stars. We therefore conclude that any algorithm must perform at least 0(\/n) queries in expectation 
in order to construct an admissible solution. 



F.2 Proof of Theorem [Ql 

We apply a slight modification of the charging argument used in Theorem 15.11 Let OVT be a set 
of nodes as in the statement of the theorem. We will partition the nodes of D(OVT) as follows: 
for each i € OVT, choose a set Si C D({i}) containing i, such that the sets Si form a partition of 
D(OVT). 

During the execution of algorithm we will think of each node in D(OPT) as being marked 
either as Inactive, Active, or Charged. At first all nodes in D{OVT) are marked Inactive. 
During the execution of the algorithm, some nodes may have their status changed to Active or 
Charged. Once a node becomes Active it never subsequently becomes Inactive, and once a 
node is marked Charged it remains so for the remainder of the execution. Specifically, all nodes 
in D(OVT)f]D(S) are always marked Charged, in addition to any nodes that have been assigned 
a charge by our charging scheme (described below). Furthermore, for each i G OVT, the nodes in 
Si that are not Charged are said to be Active if i £ D(S); otherwise they are Inactive. 

Our charging scheme is as follows. On each iteration of the loop on lines 2-7, we will either 
generate a total charge of or of 1. Consider one such iteration. Let u be the node that is queried 
on line 2 of this iteration. If u ^ D(OPT)\D(S) then we will not generate any charge on this 
iteration. Suppose instead that u £ D(OPT)\D(S). If no nodes are Active after u has been 
queried^! then we place a unit of charge on u. Otherwise, let x be the node selected on line 4. Let 
z = \N(x)\D(S)\ be the number of new nodes dominated by x, and let z' be the number of Active 
nodes. Let w = m.in{z, z'}. We will then charge 1/w to w different vertices, as follows. First, we 
charge 1/w to each vertex in D(OPT) n (N(x)\D(S)) (note that there are at most w such nodes). 
If fewer than w nodes have been charged in this way, then charge 1/w to (arbitrary) additional 
Active nodes until a total of w nodes have been charged. We mark all charged nodes as Charged. 

We claim that the total expected charge placed over the course of the algorithm will be 

6 This situation can occur only when it is the only node in Si\D(S) for some i. 
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pe\S\/3. To see this note that, on each iteration of the algorithm, there are at least pen nodes 
in D(OPT)\D(S) (since the algorithm has not yet completed). Thus, with probability at least pe, 
a node from D(OPT)\D(S) will be chosen on line 2. Thus, in expectation, at least a pe fraction 
of iterations will generate a charge. Thus, on algorithm termination, the sum of the charges on all 
vertices is expected to be at least pe\S\/3. 

Choose some i 6 OPT and consider set Si. We will show that the total expected charge 
placed on the nodes of Si during the execution of algorithm A2 is at most (1 + if (A)). Since 
there are \OVT\ such sets, and since only nodes in sets Si ever receive charge, this will imply 
that the total expected charge over all nodes is at most (1 + H(A))OVT. We then conclude that 
pe|5|/3 < (1 + H{A))\OVT\, completing the proof. 

The analysis of the total charge placed on nodes of Si is similar to the analysis in Theorem 15.11 
In expectation, a total charge of 1 will be placed on the nodes of Si before i £ D(S) (this is phase 
1 in the proof of Theorem 15. ip . After i £ D(S), all nodes in Si\D(S) are marked Active. When 
a node is crawled on line 4, if k > nodes in Si\D(S) are Active, then it must be that i E D(S) 
but i S. Thus, i is a valid choice for the node selected on line 4. So, on any such iteration, it 
must be that the node selected on line 4 dominates at least k new nodes. We conclude that each 
node that is charged on this iteration receives a charge of at most 1/k. 

To summarize, if k nodes of Si are Active on a given iteration, then any nodes in Si can be 
charged at most 1/k on that iteration. Since < A, we conclude in the same manner as in 
Theorem 15.11 that the total charge allocated to nodes in Si, after the first node in Si becomes 
Active, is at most X>fc=i \ = H(A). We conclude that the total expected charge placed on all 
nodes in Si is at most 1 + if (A), as required. 



F.3 Proof of Theorem 

We define a distribution over input graphs on n nodes corresponding to the following construction 
process. Let be a parameter we shall set in a moment. Create two star subgraphs one on 
n — yjn — 2k nodes and one on y/n nodes, chosen uniformly at random. We connect one arbitrary 
leaf of the big star subgraph to one arbitrary leaf of the smaller star subgraph. To complete the 
construction we choose k spoke nodes from the bigger star subgraph and connect each of them to 
one new node of de gree one. This gives us a connected graph on n vertices. By the Yao minmax 



principle lYad [19771 ]. it suffices to consider the expected performance of a deterministic algorithm 
on a graph chosen from this distribution. 

Note that OPT is at most 2c + k as it can always choose the hubs of the two stars. The 
worse cost of the min dominating set algorithm happens when it uses is at most (1 + 2k + 2)c. 
This happens when the algorithms starts from a spoke in the bigger star component and need to 
traverse all k spokes that were assigned one new neighbor. Only after traversing all such nodes we 
move into the spoke of the small star subgraph and then the ub of the smaller star subgraph. Thus 
the approximation ratio is at least ^^c+k ■ This expression is the biggest (as a function of k) for 
k = 0(c). In that case the expression is 0(c). 



F.4 Proof of Theorem IOI 

We define a distribution over input graphs on n nodes corresponding to the following construction 
process. Build a clique on n — y/n vertices and remove one edge (u,v). Next build a star with 
i/n— 1 leaves, say with root r, and label one of the leaves v' . Finally, add edge (v,v'). By the Yao 
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minmax principle lYaol 19771 ] , it suffices to consider the expected performance of a deterministic 
algorithm on a graph chosen from this distribution. 

For this graph, the set {r, v} has cost 2c. Consider the set S returned by a 1-local algorithm; 
we will show that S will have cost at least y/n with high probability. If S does not include r or u 
then it must leave \fn nodes uncovered (or else contain at least \fn vertices), in which case it has 
cost at least \fn. So S must contain some node in the star centered at r. A node in the star can 
be found either via a random query or by querying node v. Since the star contains y/n nodes, it 
would take f2(y / n) random queries to find a node in the star with high probability. On the other 
hand, node v is indistinguishable from the other nodes in the (n — y / n)-clique until after it has been 
queried; it would therefore take f2(n) queries to the nodes in the clique to find v, again with high 
probability. We conclude that the cost of S is at least y/n with high probability, as required. 



G Concentration bounds 

Lemma G.l. (Multiplicative Chernoff Bound) Let X^ be n i.i.d. Bernoulli random variables with 
expectation u each. Define X = Y17=i ^i- Then, 
For < A < 1, Pr[X < (1 - X)/j,n] < exp(-/inA 2 /2). 
For < A < 1, Pr[X > (1 + \)/j,n] < exp(-/xnA 2 /4). 
For A > 1, Pr[X > (1 + A)/m] < exp(-^nA/2). 

Lemma G.2. (Additive Chernoff Bound) Let Xi be n i.i.d. Bernoulli random variables with ex- 
pectation \i each. Define X = ^™ =1 X$. Then, for A > 0, 
Pr[X < fm - A] < exp(-2A 2 /n)- 
Pr[X > fin + A] < exp(-2A 2 /n). 

Lemma G.3. (Concentration of Geometric Random Variables) LetYi ben i.i.d. Geometric random 
variables with expectation n each. Define Y = Y17=i Then, for A > 0, 

Pr[Y > (1 + A)/m] < exp(-2nA 2 ). 



Lemma IG. 31 can be found, for example, in Chapter 2 of iDubhashi and Panconesil 20091 ] 
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